Advanced CKD detection through optimized metaheuristic modeling in healthcare informatics

Data categorization is a top concern in medical data to predict and detect illnesses; thus, it is applied in modern healthcare informatics. In modern informatics, machine learning and deep learning models have enjoyed great attention for categorizing medical data and improving illness detection. However, the existing techniques, such as features with high dimensionality, computational complexity, and long-term execution duration, raise fundamental problems. This study presents a novel classification model employing metaheuristic methods to maximize efficient positives on Chronic Kidney Disease diagnosis. The medical data is initially massively pre-processed, where the data is purified with various mechanisms, including missing values resolution, data transformation, and the employment of normalization procedures. The focus of such processes is to leverage the handling of the missing values and prepare the data for deep analysis. We adopt the Binary Grey Wolf Optimization method, a reliable subset selection feature using metaheuristics. This operation is aimed at improving illness prediction accuracy. In the classification step, the model adopts the Extreme Learning Machine with hidden nodes through data optimization to predict the presence of CKD. The complete classifier evaluation employs established measures, including recall, specificity, kappa, F-score, and accuracy, in addition to the feature selection. Data related to the study show that the proposed approach records high levels of accuracy, which is better than the existing models.

• CKD (Chronic Kidney Disease) • NKD (NO Kidney Disease) • AKI Kidney Injury is a type of (acute kidney injury) • ESKD (End-stage kidney ailment) The research aims to diagnose chronic kidney disease and identify individuals with N o kidney disease.CKD affects about 10% of the population, killing millions every year.Hypertension, obesity, family background, medicines, age, and race all have something to do with CKD.The medical sector has an urgent and escalating need for a method to detect CKD as soon as possible 3 .
Chronic low back issues have a variety of covered conjunctions.The escalating frequency of CKD and the complex nature of its causal agents underscore the need for better diagnostic methods.Even though we have

Literature review
The accurate prognosis of CKD remains a significant difficulty within the healthcare field.Throughout the years, numerous researchers have devoted their endeavors to investigating various methodologies.The group's primary objective is to enhance the precision and timeliness of CKD detection.This is primarily achieved through refining data pre-processing techniques, feature selection methods, and classification approaches.AlMuhaideb and Menai 26 adopted a different method, emphasizing the importance of pre-processing techniques.The research conducted by the authors insisted on the significance of feature subset selection and the complexities associated with managing missing data.By utilizing an ant colony metaheuristic approach in classification, the researchers could attain improved levels of predicted accuracy, occasionally exceeding 60%.However, the issue of ensuring model transparency and interpretability persisted.
Akben 27 achieved notable progress by integrating methodologies for medical data classification, utilizing K-Means for the pre-processing process, and implementing classification techniques such as KNN, SVM, and NB.The method used by the researchers relied on assessing urine test features, resulting in a significant accuracy rate of 97.8%, especially among individuals aged 35 years and older.The current investigation has indicated that various combinations of dataset characteristics provided variable degrees of accuracy, ranging from 83.75 to 97.8%.Additionally, it should be emphasized that certain combinations took additional processing time.Yang et al. 28 studied the implementation of the Iterative Dichotomiser 3 (ID3) approach, which applies a heuristic way to simplify the data categorization process.Although the ID3 technique showed promise, it had significant drawbacks, notably the necessity to divide dataset values.In their study 29 , Shen et al. proposed the SVssM-FOA method, which performs better than artificial neural networks (ANNs) across multiple metrics.Nonetheless, the underlying issue of maintaining model transparency and addressing datasets with numerous dimensions persisted.
Khamparia et al. 20 introduced a unique approach that employs a deep stacked autoencoder (DSAE) technology to identify the categorization of CKD by applying multimedia data learning methodologies.The study aimed to foresee CKD during its earliest phases, helping lower treatment expenditures for afflicted persons.The current work introduces the Deep Stacked Autoencoder (SAE) method as a novel strategy for recognizing CKD.The www.nature.com/scientificreports/Stacked Autoencoder (SAE) design has two auto encoders and a softmax classifier.Nevertheless, a single-stacked auto encoder proved ineffective in lowering the input features' dimensionality.
Moreover 30 ,developed a prediction model for CKD using numerous resampling methods and ML systems.The resampling methods include the synthetic minority oversampling technique and Borderline-SMOTE, while the classifiers include the decision tree, AdaBoost, and random forest.The finding adopted indicates that the decision tree combining the SMOTE method exhibited the highest level of presentation, with an accuracy rate of 98.99%.Ordinarily, ML research studies adopt various attributes, such as hemoglobin level, albumin, red blood cell count, white blood cell count, blood pressure, packed cell volume, specific gravity, and others, in identifying patients at high risk of CKD.In turn, the patients can receive timely and cost-effective medical intervention from the medical practitioners.Although much focus has been given to machine learning in predicting CKD, few studies explicitly acknowledge identifying the key traits required to realize CKD detection [31][32][33] .If accurately identified in individuals suspected of having CKD, these characteristics could be employed for efficient computerassisted CKD diagnosis.
The findings of Salekin and Stankovic 34 were supported by their investigation into feature selection, where they utilized wrapper and filter procedures.In the research 35 , Elhoseny et al. directed a comprehensive examination of an investigative classification and estimate framework for chronic renal ailment.The introduction of the Density-based feature selection with the Ant Colony-based Optimization (D-ACO) method aimed to address the identification and classification of CKD inside healthcare services.The acquired performance measurements estimated the parameters and improved the overall efficacy.The primary drawbacks of this optimization approach were the occurrence of early convergence and a lack of improvement in output.In this study 36 , the authors enhance CKD diagnosis through classification and efficient feature selection methodologies.The methodology described in the study involves the application of the Oppositional FireFly Optimization (OFFO) algorithm to discover relevant features.These features are combined with a Deep Neural Network (DNN) for classification.The article highlights the significant presentation of the deep neural network (DNN) model, which achieved an impressive accuracy of 98.89% in classifying CKD.
In the study 37 , the authors employed a thorough methodology by utilizing various ML procedures in a dataset of 400 patients.The authors' methodology, which encompassed replacing missing values and fine-tuning parameters, exhibited promising results.Nevertheless, the most extensive obstacles to the transparency of models and the management of complex datasets have yet to be resolved.In a related study, the authors 38 examined the application of Bayesian networks in identifying at-risk pregnancies.In contrast, another study by the authors 39 focused on predicting illness comorbidities through weighted association rule mining.Although not directly associated with CKD, both research emphasized the adaptability of machine learning.
In their study, Jongbo et al. 40 analyzed the utilization of ensemble techniques, specifically bagging and random subspace, in constructing a diagnostic model for CKD.The primary goal of the case study was to increase the classification accuracy of the Decision Tree, Naïve Bayes, and k Nearest Neighbors classifiers.A notable result was achieved with the K-nearest neighbors classifier making precise predictions using a random subspace ensemble.Meanwhile, the increased cost may be related to the necessity of comprehensive storage of healthcare training samples.The authors cited in references [41][42][43] have made notable contributions by introducing distinct approaches and algorithms to improve the prediction of CKD.Nevertheless, the fundamental obstacles to high-dimensional datasets' computational complexity, interpretability, and management persisted.
Within the domain of medical data classification, where phases such as feature selection and classification have significant importance, this study explores the complexities associated with classification algorithms and feature selection techniques.The issues encountered in the prediction of CKD are complex and diverse.According to Zhao et al. 44 , a noteworthy problem is precisely recognizing CKD's presence while ensuring that critical data items from the input dataset are not lost.Furthermore, it has been noted by 37 that typical Artificial Neural Networks (ANN) are insufficient in terms of their descriptive capabilities for nuanced medical data processing.A deep learning method is employed to address CKD data overtraining.
Nevertheless, given that the Particle Swarm Optimization (PSO) method can gravitate towards local optima owing to its low combination rate.In addition, it has been proven that kernel-based support vector machine (SVM) classifiers tend to result in misclassifications and omissions, notably when dealing with datasets of varied sizes 29,45 .The suggested approach involves the application of BGWO for feature selection and the ELM model for classification.This approach gives a realistic answer by handling concerns with execution time, loss of crucial data surface, and excessive feature dimensions.This result implies substantial progress in accurately predicting CKD and setting new standards in medical data classification.
The proposed work makes the following key contributions:

Dataset
The model under consideration utilizes the UCI Dataset 46 .This dataset is specifically designed to support studies focused on CKD detection.It comprises 25

Methodology
This study introduces an innovative methodology to address the complexities of forecasting patient ailments.This research aims to advance the precision of disease estimates by applying sophisticated medical data classification methods.Our primary objective is to boost the classification efficacy of the CKD dataset by employing a rigorous feature selection process.In the subsequent parts, we shall explain the comprehensive approach utilized in our study.
Stage. 1 Pre-processing: In the first step of our inquiry, we apply a dataset regarding CKD provided by the UCI machine learning repository.The admission of noise, incompleteness, and data disputes within real-world medical datasets is generally recognized, principally attributable to these databases' different sources and sizes.A thorough preprocessing approach is implemented to the CKD dataset to boost the overall data quality.This method involves modifying data, addressing missing data, and normalizing data.This technique not only improves the accuracy of the predictions but also ensures the sanctity of the data set.
Stage. 2 Feature Selection: A thorough feature selection process must be conducted to achieve the best possible classification accuracy.This study employs the BGWO approach to satisfy the required objective.The BGWO algorithm was pivotal in ELM was applied in the last stage of the study to classify the existence of CKD.The ELM model uses the criteria provided and runs the learning process to determine whether a person has CKD by using a medical data set.The algorithm has gained universal acceptance due to its outstanding adequacy performance in machine learning and the fact that it can extract complicated patterns from learning machines.This could be attributed to the entities' innate capabilities as they are suitable for classifying medical data, ensuring reliable and robust results.The main objective of the research was to improve the accuracy of CKD diagnosis by a modified method that dealt with the problem of disease classification properly.The objective is accomplished by employing a comprehensive methodology that includes pre-processing methods, optimal feature selection options, and ELM application in classification.The goal was to create an accurate and effective model for detecting CKD from healthcare data sets.At the beginning of the first phase of data preparation, the full process is shown in Fig. 1, and the process ends with classification based on ELM.This method can improve CKD accuracy significantly and change how CKD patient care is offered.

Missing value imputation
The dataset offered a major challenge in this inquiry because a considerable portion of the data was missing.As a justifiable solution to this problem, we implemented the Predictive Mean Matching method in the line of multiple imputations.The approach was easily accomplished using the MICE package in R. The programming language's broad functionality allowed us to address the challenge of missing values successfully.The rigorous www.nature.com/scientificreports/approach has given us a highly cleaned dataset that was specifically tailored, and a copy was stored in a CSV format.This study used the well-sanitized dataset in the next steps, where we developed different Python models, ensuring the credibility of our predictions and classifications.The details presented below follow the sequential steps outlined in the study 47 .
Sequential steps for PMM are as follows: Step 1: In step one, using a set of regression models, we resolved to estimate, for all subjects with appropriate "y" components for which data was missing, the expected means expressed as " yi" .The task also stipulated that computations be conducted to estimate the posterior projected means expressed as "yo * " .Herein, "o" is a notation considering the context of variables that have data missing.
Step 2: In this phase, we selected a set of "K" donors that are possibly ideal, with the intended conceptual meaning being that the distance d(0, i) = |yo * − yi| is the minimal.In this context, "donor" denotes variables with no missing values.
Step 3: Following the third step of the procedure, a single donor was randomly picked from a list of possible donors.The observed values of the selected donor were subsequently employed to impute the missing values in the recipient's variable " o."

Data transformation
In the context of our data pre-treatment pipeline, we have handled the conversion of nominal features.These features frequently consist of values that are essentially non-numeric, such as "yes/no," "good/poor," "present/not present," or "normal/abnormal."To facilitate their integration into our analytical methodologies, we transformed them into a binary representation denoted by the digits "1" and "0." To achieve this change, we utilized the label encoding technique, a function conveniently accessible within the Sklearn package in the Python computer language.This methodology transformed nominal categorical variables into a numerical representation, assuring congruence with the following data analysis and modeling stages.

Data normalization
Data normalization is a fundamental approach used in database architecture to organize and structure data to minimize duplication and enhance data quality.
The training and test sets have been standardized to a consistent scale in this stage.Additionally, the data has been adjusted using min-max scaling to achieve a consistent scale.The Equation for min-max normalization, as stated in reference 48 , is denoted as an equation.One throughout the literature.
The variable X norm Signifies the normalized value of variable X following transformation.The symbol X 0 signifies the present value of variable X .X max Signifies the maximum value inside the dataset.The variable X min Signifies the minimum value within the dataset.

Dataset splitting
After normalizing the dataset, we partitioned it into two datasets: 80% of the data was used for training, and the remaining 20% was used for testing.The partition was based on the stratified split method.This procedure balanced the cases split for the CKD and non-CKD parameters concerning both datasets.Thus, the model was trained and tested with an actual proportion of classes and had a proportional representation.This ensured the model had an equitable distribution and offered better accuracy and reliability.

Feature selection
In this module, algorithms use the BGWO method to select a discriminating subset of features according to a certain criterion.Feature selection is a necessary step and forms a method of predominant interest.The traditional feature selection process consists of four main steps: subset formation, subset evaluation, stopping criteria, and previous knowledge or validation of the results.The first step in the subset formation method is to form a candidate feature subset to evaluate, which is generated using the BGWO algorithm based on the processes used by the wolves.Next, each subset formed is evaluated relative to the current subset using a certain criterion.The subset formed is replaced if the new subset formed is better than the previous subset.
This subset formation and evaluation action is repeated until a stopping criterion is met.After this, the subset selected as the most optimum is validated using either previous knowledge or tests on past datasets.

Feature subset optimization
Feature selection is a critical aspect of ML, and it significantly impacts dataset quality.The exclusion of unnecessary features adds value by speeding the training time, making model development easier, and improving the understanding of data 49 .Subset optimization is enhanced when the BGWO algorithm is adopted, which helps conceptualize the features of a model as 'Grey Wolves.' Both the interpretability of the model and its performance are enhanced.One of the main reasons why the subset optimization of features is so important is that, in some cases, peaks are reached.This tendency of features reaching maximums increases overfitting, which results in high numbers of features with redundant information.The BGWO algorithm, therefore, has facilitated the increased accuracy of data understanding and the identified features that contribute significantly.

Grey Wolf optimization algorithm
The approach is a population-based computational optimization technique rooted in evolutionary computing, like the prestamped precession in grey wolves 25 .The acquisition of the social architecture of a grey wolf pack inspired the computational technique.Normally, the pack comprises 5-12 members with comparatively high intelligence.Within the social structure of the group, the grey wolves are classified into four distinct groups, namely alpha ( α ), beta ( β ), delta ( δ ), and omega ( ω ), based on the prevailing hierarchy.Alpha individuals within a social group are responsible for making predation, rest, and activity choices, whereas beta individuals provide assistance and support.Deltas have a hierarchical relationship with alphas and betas while possessing the ability to exert influence over omegas, who are obligated to comply with the directives of superior wolves.The model of grey wolf predation has two distinct processes, as stated by 25 .Initially, the wolves encircle the target, as seen by where the variable t signifies the iterations.The vector − → X signifies the position of the wolf while − → X p Signifies the position of the target.Additionally, − → A Refers to the coefficient constant.The value of a exhibits a linear drop from a value of two to zero while the number of iterations grows.The vectors − → r 1 and − → r 2 Are randomly generated within the range of [0, 1].Within the framework of the GWO method, the designations of "alphas," "betas," and "deltas" are assigned to the candidate solutions based on their relative performance.Alphas are regarded as the most optimal solution, betas as the second-optimal solution, and deltas as the third-optimal solution.Individuals classified as alphas, betas, and deltas possess a significant amount of knowledge about the location of food resources.Once optimal positions are achieved, it becomes necessary for other search entities, including the omegas, to revise their places as well.To enhance their predatory efforts, wolves must undertake positional updates, especially those occupying the omega position within the pack hierarchy.
where the vectors − → x 1 , − → x 2 and − → x 3 Are computed by: The first three optimal solutions for every iteration are denoted as The vectors − → C 1 , − → C 2 and − → C 3 Are designed using Eq. ( 7) .The process is repeated when the wolves effectively apprehend the prey.

Binary Grey Wolf optimization
In the context of the GWO algorithm, wolves can dynamically alter their locations to locate and capture prey effectively.However, some tasks, such as feature selection, provide a binary space issue where the solution is constrained to values of either zero or one.This poses a challenge for the conventional GWO algorithm.

And
where rand represents a random number within the range of [0, 1] that conforms to a uniform distribution.
The variable x t+1 d represents the updated position of a d-dimensional binary wolf the tth iteration.The sigmoid is formally specified as The variables x 1 ,x 2 , and x 3 are binary vectors that symbolize the outcome of wolf movement in the direction of the alpha, beta, and delta grey wolves, respectively.They are designated by  6), ( 12), ( 13), (14).Similarly, with this BGWO, the data is updated by data (with optimum features) from every place.Algorithm 1 delineates the BGWO pseudocode.
Initialize and a population of wolves' positions randomly in the range [0, 1]. 4.
While stopping criteria are not met, do 5.
For each wolf in the pack, do the following: 7.
End for 9.
For each wolf in the pack, do the following: 14.
End for 16.End While 17. Output: Identify optimal selected features based on the final wolf positions that optimize the objective functions Algorithm 1: The Proposed Pseudocode of (BGWO) The solution in this research is denoted as a one-dimensional vector, whereby its dimension corresponds to the number of features.In the context of this binary vector, the values 0 and 1 represent the following: The solution in the present investigation is represented as a vector with one dimension, where the dimension aligns with the number of features.Within the framework of this binary vector, the numerical values 0 and 1 correspond to the following meanings: 0: The feature has not been chosen.1: The feature has been chosen.The process of feature selection inherently has a dual-objective aspect.One primary goal is to decrease the number of features, while the other is to improve classification precision.To achieve both goals simultaneously, the fitness function incorporates the following equations, applying the KNN classifier described in 25 and 50 .
The parameters ∝ and β are defined as α = [0,1] and β = (1 − α ), respectively, are adopted from 25 .The term ρ R (D) designates the rate of error of the KNN classifier.Furthermore,|S| represents the nominated the features subset, whereas |T| denotes the whole of features included in a data set.
After successfully integrating the optimal feature selection segment, disease detection is conducted using the classifier.The ELM technique utilizes a classification methodology to ascertain the existence or non-existence of CKD by analyzing medical data.

CKD data classification
The features selected through BGWO are employed in the CKD classification phase.In this phase, we emphasize training an ELM model to classify CKD.

Extreme learning machine (ELM)
The ELM 51 is a highly adaptable feed-forward neural network often employed for many computational tasks, including classification, regression, and clustering.The ELM is capable of having either a single or multiple hidden layers.While a single hidden layer can suffice for simpler problems, providing rapid training and reduced computational demands, it may not perform adequately for more complex datasets, where multiple layers could capture deeper patterns and interactions within the data.The proposed model consists of input notes receptive to the hidden nodes and other notes that form the final output.Similar to other neural networks, rectified linear units activate the hidden nodes.The key feature of our algorithm is that the hidden node parameters are fixed.These parameters include biases and weights.They can either be kept unaltered, or they can be transferred as they are.This differs from the back-propagation algorithm, a common approach used to train neural networks.While effective, back-propagation is limited because weights require continuous updates; the algorithm does not consider the weights' magnitudes and tends to get stuck in local minima.In addition, we included adjustment of weights and biases' magnitude to prevent over-fitting.The dropout techniques lock the training phase to ensure that the method does not generalize the testing and training around the vectors.However, during testing, all the input node weights are returned & those arriving at the hidden unit nodes are weighted and multiplied.This prevents the number of weights connecting the input and hidden nodes from being changed.The ELM is illustrated in Fig. 2. On the other hand, ELM is better at faster learning when compared to the networks that have been trained on back-propagation.Finally, in using a validation tool, we can watch the learning process and ensure that the complexity of the model allows the model to generalize testing to new data.
Regarding a set of H random samples denoted as (pi, t i ) , where p i = [p i1 , p i2 , . . ., p in ] T ∈ Q n and t i = [t i1 , t i2 , . . ., t im ] T ∈ Q m .The basic single-hidden layer feed-forward neural network (SLFN) with G hidden nodes and an activation function f (.) may be mathematically stated as: The weight vector a i linking the ith hidden node with the input nodes, denoted as a i = [a i1 , a i2 , . . ., a in ] T Input nodes are denoted as w i = [w i1 , w i2 , . . ., w in ] T .In this context, this weight vector links the ith hidden node to the output node.The variable c i Signifies the threshold value related to the ith hidden node.Additionally, the variable o j = [o j1 , o j2 , . . ., o jn ] T Signifies a vector of outputs for the jth node, which is created by the SFFN.
Within the SLFN domain that uses G hidden nodes and an activation function f (.) , these networks can accurately estimate a collection of H illustrations without error.The condition G j=1 � o j − t j �= 0 represents an accurate estimate, indicating that the total of the discrepancies between the output values o j of the network and their respective goal values t j It is equal to zero.This noteworthy accomplishment is made possible by the presence of appropriate weight vectors.wi , input vectors a i , and hidden node thresholds c i , which guarantees the fulfillment of this zero-error criterion.
The Equation mentioned above may be concisely stated as below: where ( 22) Vol.:(0123456789) www.nature.com/scientificreports/ The term " M" represents the output matrix derived from the hidden layer.In matrix M , each column, expressed as the kth column, corresponds to the output produced by the kth hidden node concerning the inputs y 1 ,y 2 and so forth up to y H .The resolution of the linear system may be mathematically epitomized as: In the given context, the symbol M −1 denotes the Moore-Penrose generalized inverse of the matrix M. The ELM's output function is defined as below: In the context of ELM training, three vital parameters are of significance.These parameters include the training set, which is signified as K = y j , t j y j ∈ Q n , t j ∈ Q m , j = 1,2, . . ., H] , the output function of hidden nodes, denoted as f (a i , c i , y i ) , and the number of hidden nodes, referred to as G .The ELM training procedure may commence once all parameters have been properly set.
The Extreme Learning Machine starts its training process by generating random values for the G pairs of hidden node parameters a i , c i .. The output matrix M is then created using Eq. ( 24).Since the model constitutes input data along with these randomly generated parameters, the ELM can then evaluate the output weight vector w with the help of Eq. ( 28).Once the training process is completed, the model can be applied to predict the results for the test data tuples using Eq. ( 29).In this way, the ELM training process can be defined as follows: The training set A is provided by . .N} with activation function f (x) and the number of hidden neurons N: Initially, random values are assigned to the input weights w i and biases b i .Then, a computation is held to determine the resulting matrix M of the hidden layer.
The output weight vector w can be computed as follows: w = M × T This structured training process allows ELM to effectively analyze and classify the collected data, thereby allowing accurate predictions of the findings for any new samples.

Experimental setup
The presence of chronic kidney disease (CKD) in the dataset was determined using BGWO feature selection, and the ELM classifier was implemented in MATLAB/Simulink on a workstation equipped with an Intel Core i7, 2.60 GHz CPU and 8 GB RAM.The performance metrics, namely accuracy, recall, specificity, kappa, and F-score, were evaluated, and the actual results were compared with the predicted outcomes.
The following information outlines the fundamental parameter configurations.The method for BGWO was implemented with a population size of 15 wolves, aiming to attain a harmonious equilibrium amid exploration and exploitation strategies.A maximum of 100 iterations was set.In the ELM scenario, we conducted experiments to evaluate the impact of varying the number of hidden nodes ranging from 50 to 250.Additionally, we applied the sigmoid activation function to enhance the classification capabilities of the model.

Evaluation criteria
The key purpose of this study is to ascertain the classification of an input sample as either a positive sample class or a negative sample class.Four potential estimation outcomes exist, which may be classified according to the nomenclature outlined as follows: • True Positive (TP) -the model accurately predicts the positive class.
• True Negative (TN) -the model accurately predicts the negative class.
• False Positive (FP) -the model accurately predicts the positive class.
• False Negative (FN) -the model inaccurately predicts the negative class.
Table 2 provides the formulas for various estimation measures used in our analysis.Classification Accuracy: This statistic quantifies the ratio of accurately categorized data tuples to the overall number of classifications.
Recall (Sensitivity): The recall metric estimates the ratio of correctly anticipated positive instances to the overall number of positive cases.
Specificity: Specificity assesses the proportion of true negative outcomes appropriately classified by the classifier concerning the overall number of predicted negative outcomes.It is particularly useful in scenarios where correctly identifying negative cases is vital.
Out of the 25 features in the CKD dataset, this study utilized Binary BGWO to select the optimal subset of 15 features.The chosen features and their descriptions are detailed in Table 3.

Performance Evaluation of Feature Selection Algorithms
Within this particular section, a thorough examination was undertaken to assess the effectiveness of numerous feature selection methods.The accuracy of these algorithms was assessed across different feature set sizes ranging from 10 to 60, using the CKD database as our testbed.
Figure 3 In this figure, we showcase the accuracy results of various algorithms, namely the proposed BGWO, GWO, Particle swarm optimization (PSO), Monarch Butterfly Optimization (MBO), and Genetic Algorithm (GA), across varying numbers of features in the CKD dataset.The accuracy assessments were conducted based on the selected features.Table 4 provides a detailed overview of the simulation outcomes for several feature selection algorithms.This study's findings reveal that the BGWO model consistently performed better than other feature selection methods, achieving notably high accuracy on the CKD dataset.

Performance analysis of classification techniques
In this phase of our study, we executed several classification techniques, including AdaBoost 52 , Naïve Bayesian (NB) 53 , Perceptron 54 , and k-Nearest Neighbors (KNN) 55 .Subsequently, we evaluated the efficacy of the ELM model.

Enhancing ELM performance through variations in hidden nodes
To enhance the accuracy of the ELM, the authors conducted experiments involving the adjustment of the number of nodes in the ELM model's hidden layer.The range explored for the hidden layer node count extended from a minimum of 50 to a maximum of 250.The experimentation revealed that modifying the number of hidden layer nodes led to a notable efficiency improvement in the ELM model.For a summarized presentation of the results, please refer to Table 5, and for a visual depiction, consult Fig. 4.  A value of 1 designates complete agreement, while a value below 1 implies a lesser degree of agreement The best value is 1, while the worst value is 0 The data shown in Table 5 demonstrates that the ELM attains the best level of accuracy when using 200 hidden nodes.Configurations utilizing 250, 150, 100, and 50 nodes also yield relatively high levels of accuracy, although somewhat lower than the configuration mentioned above.While the rise in the number of nodes from 50 to 200 results in a little fall in the Kappa value, it is noteworthy to mention a major improvement in the recall and F-score values.The ELM model demonstrates the highest accuracy (0.9698) when using 200 hidden layer nodes.This is followed by configurations with 250 (0.9668), 150 (0.9575), 100 (0.9461), and 50 (0.9351) hidden layer nodes, respectively.The distribution of input processing over several neurons via an increase of hidden layer nodes reduces the computational burden per neuron and streamlines operations.Nevertheless, when the number of nodes is above a certain threshold, it gives rise to intricacies in the input handling process, leading to a decline in the model's overall performance 56 .

Performance comparison of ELM with different classification models
This section compares the performance of many classifiers compared to the ELM classifier, which has been constructed with 200 nodes.It is important to acknowledge that the dataset has been partitioned into two distinct subsets, whereby 80% of the tuples are designated for training.In contrast, the remaining 20% are put aside for testing.Multiple evaluation criteria, such as accuracy, Kappa statistics, specificity, recall, and F-score, were employed to facilitate comparison.The results are briefly presented in Table 6, accompanied by a graphical depiction in Fig. 5.
According to the data shown in Table 6, the ELM model has the greatest level of accuracy, measuring at 0.9698.Conversely, the Perceptron model exhibits the lowest accuracy, scoring 0.8314.After thoroughly analyzing many   www.nature.com/scientificreports/models, it became apparent that the suggested ELM exhibited superior performance compared to all traditional classification approaches.The analysis of the CKD dataset is shown in Fig. 6.This dataset consists of two unique classes, class 1 and class 2, which indicate the existence or non-existence of CKD.The graph displays the classes on the x-axis and the performance measures, which include the accuracy, recall, and specificity on the y-axis.The recommended technology was incredibly useful as it yielded impressive results.More specifically, the statistics acquired from the assessment of class 1 showed that the accuracy reached 96.8%, the sensitivity 95.10%, and the specificity 94.12%.Additionally, class 2 was assessed, and results similar to the ones received when examining class 1 were gained, with the accuracy at the 97.90% mark, the sensitivity at 84.15%, and the specificity at 97.90%.
The Receiver Operating Characteristic curve is a graphical standpoint used in the analysis of classifiers' diagnostic skills as well as feature selection strategies.The ROC curve represents the trade-off between sensitivity, defined as the True Positive Rate, and 1-Specificity, the False Positive Rate.Meanwhile, the Area Under the Curve represents a critical parameter of the efficacy of a classification model-the closer the AUC is to 1, the more accurate the model is.ROC curves were plotted in a particular study utilizing the Chronic Kidney Disease dataset.The study was aimed to test the effectiveness of different feature selection strategies.Specifically, these   • With the ELM classifier, the AUC values were: • GA: 0.965 ± 0.008 • MBO: 0.966 ± 0.008 • PSO: 0.982 ± 0.006 • The AUC values for GWO and BGWO were not specified but are implied to be significant.
From the conducted research, it was proven that such an approach as BGWO is efficient.Thus, according to the ROC curve given in Fig. 7, when BGWO was implemented with the ELM classifier, the AUC value was equal to 0.982 ± 0.006, which means that this approach can optimize categorization better.This study shows that BGWO can be regarded as a useful feature selection strategy to improve the performance of classifiers and enhance the diagnosis accuracy.

Execution time analysis
Given the forecasting of the actualization of CKD-influenced persons, we looked into the span of various tasks, as portrayed in Fig. 8.In our work, we first analyzed the computational ability of different machine learning algorithms such as AdaBoost, KNN, NB, Perceptron, SVM, and our proposed method, ELM.The simulation analysis started by applying pre-processing tasks on the input dataset.Following this, the BGWO method was employed to quantify each data set's features.Applying this feature selection technique was essential to facilitate our ability to reach the highest level of accuracy during our simulations while reducing the processing time required.
On the other hand, the BGWO process was adopted to rank the subset of features and select the best-rated ones.The features given above were then fed into the classifier for individual consideration.At the time of this analysis, we regularly monitored the application of the classification rate, aspiring for an elevated level of accuracy, specificity, and recall levels, which can be noticed in Fig. 5.

Computational complexity analysis
This section considers computational aspects involved in an advised method for CKD categorization.The solution employs the BGWO algorithm for feature selection and the ELM method for classification.The pre-processing  step is performed effectively, demonstrating a linear time-based complexity of O(n), where 'n' is a picture's original dimensionality.The BGWO algorithm utilized for performance improvement has a time complexity O(k × n × m) with 'k' indicating the number of iterations, 'n' is the picture's original dimensionality, and 'm' used to describe the feature space size.The ELM model for CKD classification shows a linear complexity of time utilized for the original picture pre-processing represented by O(p × (n + m)) .Here, 'p' refers to the hidden neurons number, 'n' serves as the dimensionality of the pre-processed picture, and 'm' is the feature selection model size.As a result, the BGWO-ELM calculation model employed to address CKD diagnostic problems accepts a solution complexity of (O k × n × m + p × (n + m) ) .The latter demonstrates the current model is more computationally efficient and precise than the former methods.

Discussion
Evaluating feature selection algorithms and classification models in diagnosing Chronic Kidney Disease results in a highly rich and diverse locus of research into the viable possibilities offered by modern machine learning models in medical diagnostics.For this purpose, the work centers around a comparative analysis of feature selection algorithms.To determine the mentioned algorithms' robustness of their feature selection capabilities, data are considered at different sizes of feature sets-from 10 to 60-using the CKD database.Notably, the mentioned algorithm appears to be an effective strategy, as it consistently offers higher accuracy, recall, and specificity levels.This conclusion is noteworthy because it accentuates the BGWO approach's valuable, previously undiscovered benefits for increasing the overall accuracy of diagnostic models in cumbersome datasets such as the CKD database.The research also considered multiple classification approaches, including AdaBoost, Naïve Bayesian, Perceptron, k-Nearest Neighbor, SVM, and ELM.The critical aspect of this research includes adjusting the ELM model by varying the number of hidden nodes.The results identified this number as being at the maximum level of 200, as it showed the highest performance levels achievable by the model.The ELM classifier also showcased the highest level of accuracy in comparison with the remaining classifiers, which is especially true for the number of hidden nodes highlighted as displaying optimal accuracy.These data highlight that the ELM model is a highly effective classifier for managing the intricacies present in the CKD datasets when adjusted to the specified parameters.
The research data can also be deepened by applying the Receiver Operating Characteristic curve and the Area Under the Curve parameter, which is critical in evaluating the applied models' diagnostic accuracy.The data is relevant in the case of the ELM classifier, and the results show that the AUC values are high, which is strongly evident for the BGWO algorithms.The high AUC values strongly denote the high diagnostic abilities of the considered models, presenting them as highly viable options in clinical situations where high levels of accuracy are often of primary importance.Altogether, the data showcases how, due to the efforts of this research, a variety of intricate aspects of feature selection and classification models in medical diagnostics are unveiled.It highlights how the research accentuates the value offered by certain approaches, namely BGWO and ELM, to increase the accuracy and efficiency of CKD diagnosis.Finally, it is relevant to the clinical implementability of the research results, as they offer actual benefits for clinical practitioners, who may improve kidney disease patients' health outcomes through early and high-precision diagnosis.

Conclusion
This study analyzes medical data and categorizes methodologies to identify the early stages of an illness.In particular, it focuses on chronic kidney disease and, more specifically, difficulty in feature selection from the CKD dataset to determine the most critical subset.The methodology included a pre-processing step that addressed the issue of missing values in the dataset, namely, data transformation and data standardization.Using the Binary Gray Wolf Optimization method determines the optimality of the process.After that, the dataset is divided into two categories: the presence of CKD and its absence, denoted by the specified corresponding characteristics.At the classification stage, it is proposed that the Extreme Learning Machine be used.To achieve optimal performance and efficiency, the process of ELM classification was optimized by varying the number of hidden nodes to find a balance between the number of nodes and the accuracy of data processing.As a result, the optimization of performance characteristics, such as sensitivity, specificity, and accuracy, can be finalized by ELM, achieving the highest values of these characteristics.The study's data suggest that the classification accuracy of the ELM method is at a maximal value of 98.90%, which is significantly higher than that of other methods.
Future studies will include improvements to the Binary Gray Wolf Optimization algorithm structure, such as defining a fuzzy or improved BGWO algorithm.It is also possible to apply the proposed methodology to a larger number of datasets from various sources and contact hospital management to obtain a more diverse dataset that would be more robust and likely to include a range of clinically relevant samples.This will make the proposed framework more universal and help solve a wider range of issues than the CKD diagnosis.

and − → A 3
can determined by Eq. (5).The vectors − → D α , − → D β , and − → D δ This can be derived by: The alpha, beta, and delta wolf 's positions are denoted as x d α , x d β , and x d δ Respectively.Additionally, the values bstep d α , bstep d δ , and bstep d β are specified by The variables cstep d α ,cstep d β , and cstep d δ , are defined as follows.The values of A d 1 ,D d α , D d β and D d δ δ are computed using Eqs.(

Figure 3 .
Figure 3. Accuracy analysis of feature selection algorithms.

Figure 4 .
Figure 4. Graphical representation of the ELM efficacy.
www.nature.com/scientificreports/strategies were Genetic Algorithm, Monarch Butterfly Optimization, Particle Swarm Optimization, Grey Wolf Optimizer, and Binary Grey Wolf Optimizer, along with two different classifiers: Extreme Learning Machine and Support Vector Machine.The AUC values provided for each combination give substantial insights into the efficiency of each feature selection approach in specific relationships with classifiers.

Figure 5 .
Figure 5. Comparative analysis of CKD medical data set classification algorithms.
The strategies for maximizing feature selection and fine-tuning the hidden layer nodes inside ELM are inte- grated.This approach significantly boosts disease prediction accuracy, offering more accurate and trustworthy CKD diagnoses.•The proposed approach is subjected to a rigorous examination, encompassing standard metrics such as sensitivity, specificity, accuracy, precision, kappa, and F-score.The study's results demonstrate that the technique achieves a noteworthy degree of accuracy, exceeding the performance of existing classifier models. Vol:.(1234567890)
identifying and selecting the essential data criteria in the available CKD data set.The input for the subsequent classification step comprised carefully chosen features.Stage.3Data Classification: Therefore, the Binary GWO algorithm has been suggested to conduct feature selection in challenges that include solutions presented in binary form.This study employs two position update methods, namely Position Update Algorithm 1 (PUA1) and Position Update Algorithm 2 (PUA2), as proposed in Scientific Reports | (2024) 14:12601 | https://doi.org/10.1038/s41598-024-63292-5www.nature.com/scientificreports/

Table 2 .
Metrics for Assessing Classification Model Performance.

Table 3 .
Acquired optimal features by BGWO algorithm.

Table 4 .
Performance analysis comparing the proposed feature selection algorithms with existing methods for predicting CKD.

Table 5 .
Comparative Analysis of ELM Performance across Varied Hidden Layer Nodes.

Table 6 .
Classification results of the proposed and existing algorithms.